Word2vec Skip-Gram Dimensionality Selection via Sequential Normalized Maximum Likelihood
نویسندگان
چکیده
In this paper, we propose a novel information criteria-based approach to select the dimensionality of word2vec Skip-gram (SG). From perspective probability theory, SG is considered as an implicit distribution estimation under assumption that there exists true contextual among words. Therefore, apply criteria with aim selecting best so corresponding model can be close possible distribution. We examine following for selection problem: Akaike’s Information Criterion (AIC), Bayesian (BIC), and Sequential Normalized Maximum Likelihood (SNML) criterion. SNML total codelength required sequential encoding data sequence on basis minimum description length. The proposed applied both original Negative Sampling clarify idea using criteria. Additionally, suffers from computational disadvantages, introduce heuristics its efficient computation. Moreover, empirically demonstrate outperforms BIC AIC. comparison other evaluation methods word embedding, selected by significantly closer optimal obtained analogy or similarity tasks.
منابع مشابه
Model selection by normalized maximum likelihood
The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a data set by extracting useful information in the data apart from random noise. The goal of model...
متن کاملMaximum Likelihood vs. Sequential Normalized Maximum Likelihood in On-line Density Estimation
The paper considers sequential prediction of individual sequences with log loss (online density estimation) using an exponential family of distributions. We first analyze the regret of the maximum likelihood (“follow the leader”) strategy. We find that this strategy is (1) suboptimal and (2) requires an additional assumption about boundedness of the data sequence. We then show that both problem...
متن کاملDynamic Word Embeddings via Skip-Gram Filtering
We present a probabilistic language model for time-stamped text data which tracks the semantic evolution of individual words over time. The model represents words and contexts by latent trajectories in an embedding space. At each moment in time, the embedding vectors are inferred from a probabilistic version of word2vec (Mikolov et al., 2013b). These embedding vectors are connected in time thro...
متن کاملOn Sequentially Normalized Maximum Likelihood Models
The important normalized maximum likelihood (NML) distribution is obtained via a normalization over all sequences of given length. It has two short-comings: the resulting model is usually not a random process, and in many cases, the normalizing integral or sum is hard to compute. In contrast, the recently proposed sequentially normalized maximum likelihood (SNML) models always comprise a random...
متن کاملword2vec Skip-Gram with Negative Sampling is a Weighted Logistic PCA
Mikolov et al. (2013) introduced the skip-gram formulation for neural word embeddings, wherein one tries to predict the context of a given word. Their negative-sampling algorithm improved the computational feasibility of training the embeddings. Due to their state-of-the-art performance on a number of tasks, there has been much research aimed at better understanding it. Goldberg and Levy (2014)...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Entropy
سال: 2021
ISSN: ['1099-4300']
DOI: https://doi.org/10.3390/e23080997